Skip to main content

Internal Embedding Service

Overview

The internal embedding service uses Xenova based models.These models are open-source, JavaScript-compatible machine learning models designed for generating vector embeddings from text (and sometimes images). These embeddings are numerical representations that capture the semantic meaning of content, enabling a wide range of AI-powered applications.

By running on the Qarbine host this enables privacy, low-latency, and offline capabilities. They are available as open source making them cost-effective for developers and organizations seeking to avoid proprietary or paid embedding APIs. A variety of models is supported with different vector sizes and language capabilities, including multilingual options. Models like Xenova/all-MiniLM-L6-v2 and Xenova/gte-base offer fast inference times (as low as ˜170ms per embedding), making them suitable for real-time applications.

Prerequisites

Instance Requirements

Using this service requires at least 2 GB of memory. Without enough memory the request will hang the server process and likely an out of memory (OOM) kill event will occur. There will be messages to this effect in the system log.

Within an SSH shell the amount of RAM can be determined by running

free -h

AWS

For AWS the amount of RAM can be determined by cross referencing the instance type with the information at https://aws.amazon.com/ec2/instance-types/.

For AWS the basic resizing steps are:

  • In the AWS Management Console, select your instance, go to “Instance State,” and choose “Stop”.
  • Wait for the stop operation to complete.
  • Select the “Instance Settings” menu, then “Change Instance Type”.
  • Choose a new instance type with more memory (RAM).
  • Confirm your selection.
  • Start the instance again from the “Instance State” menu.

Azure

For Azure the amount of RAM can be determined by cross referencing the instance type with the information at https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/overview

For Azure the basic resizing steps are:

  • Open the Azure portal and navigate to your virtual machine.
  • Stop the VM
  • Access VM Settings and in the left menu, select Size.
  • Select a new VM size from the list that offers more memory (RAM).
  • Click Resize to apply the new VM size.
  • Start the VM

The instance will now have the increased memory of the new instance type.

Available Models

This service provides a local embedding feature using Xenova based models. The default model is "Xenova/all-MiniLM-L6-v2". The available models are shown below.

Model Name Description Multi lingual Vector Size
Xenova/all-MiniLM-L6-v2Fast, default general-purpose No384
Xenova/all-mpnet-base-v2Higher quality, largerNo768
Xenova/paraphrase-multilingual-MiniLM-L12-v2Multilingual, good qualityYes384
Xenova/all-distilroberta-v1Compact, efficientNo768
Xenova/paraphrase-albert-small-v2Very small, less accurateNo768

The all-MiniLM-L6-v2 model does not provide good results for more than 128 tokens
and can handle a maximum of 256 tokens, so it’s useful for sentences or small paragraphs.

Qarbine AI Assistant Configuration

To configure Qarbine access to use its internal embedding service, open the Qarbine Administration tool as described at the top of this document. Create or edit the aiAssistants setting. Sample field settings for this service are shown below.

"type" : "Internal",
"alias" : "myInternal",
"model2" : "Xenova/all-mpnet-base-v2"

There is no internal completion/inference service. See the general AI Assistants Configuration document for more details.